Single node clickhouse init #6903

plotnick · 2024-10-18T22:04:15Z

Depends on #6894, replaces #6878, fixes #6826 on the server side.

Adds a new Dropshot server, clickhouse-admin-single, analogous to clickhouse-admin-keeper and clickhouse-admin-server (which were split off from clickhouse-admin in #6837). Its sole purpose is to initialize the single-node ClickHouse database with the current Oximeter schema. We use a single in-memory lock (a tokio::sync::Mutex) to serialize initialization requests. Multi-node ClickHouse clusters will need something analogous as a follow-up.

~~Still needs testing. My a4x2 cluster is currently being uncooperative, and so this has not been through a successful "spin-up, expunge ClickHouse, regenerate" cycle. Also missing unit tests.~~

Tested on a4x2 by expunging the ClickHouse zone in a new blueprint, setting that as the target and verifying that the zone is expunged, regenerating a new blueprint, setting that as the target and ensuring that the new zone is instantiated, and ensuring that Oximeter correctly stops inserting while the db is unavailable, but automatically resumes insertions once the new one is up & initialized.

The failing test cockroachdb::test::test_ensure_preserve_downgrade_option is caused by the fact that we don't currently spin up the new clickhouse-admin-single server in a test environment, so when a blueprint containing a ClickHouse zone is executed, the initialization step fails. Thanks to @andrewjstone for fixing the last failing test (#7059).

nexus/reconfigurator/execution/src/clickhouse.rs

andrewjstone

Looks great!

andrewjstone · 2024-11-14T22:59:39Z

clickhouse-admin/api/src/lib.rs

+/// The single-node server is distinct from the both the multi-node servers
+/// and its keepers. The sole purpose of this API is to serialize database
+/// initialization requests from reconfigurator execution. Multi-node clusters
+/// must eventually implement a similar interface, but the implementation will


Nit: I'd probably avoid speculating here about implementation time and detail and only state that multi-node clusters must implement a similar interface through clickhouse-admin-server

Good call, fixed in 2ddcbd9.

andrewjstone · 2024-11-14T23:03:51Z

clickhouse-admin/src/http_entrypoints.rs

+        let ctx = rqctx.context();
+        let initialized = ctx.db_initialized();
+        let mut initialized = initialized.lock().await;
+        if !*initialized {


What happens if we want to update the schema? Is there any harm in just letting initialization run through every time? It's supposed to be idempotent, right?

CC @bnaecker

It's supposed to be idempotent, right?

Yeah, schema updates are idempotent, and also a no-op if the version in the database is at least as high as that on the client side.

We talked a little about this in an update meeting a few weeks ago; my understanding was that letting initialization run each time and therefore update the schema would represent a change to our current policy of not performing automatic schema updates. This was one of the reasons for preferring a server-side approach rather than initializing the database on every client connection. I'm fine either way; it's a trivial change to the code, but a potentially less trivial change in policy.

We talked a little about this in an update meeting a few weeks ago; my understanding was that letting initialization run each time and therefore update the schema would represent a change to our current policy of not performing automatic schema updates. This was one of the reasons for preferring a server-side approach rather than initializing the database on every client connection. I'm fine either way; it's a trivial change to the code, but a potentially less trivial change in policy.

Thanks for reminding me of that discussion Alex. However:

Currently the schema version is being issued based on a constant in the code, so it won't be updated at all unless the code changes.

If the admin server crashes or the sled is rebooted, the in-memory state will be lost and the schema will be re-initialized anyway.

I think what we'll want to do in a follow up is to make schema updates part of the planner, and then only execute them when the plan changes.

Oh, and the API call should then take the schema version to update from the executor.

andrewjstone · 2024-11-14T23:09:21Z

nexus/reconfigurator/execution/src/clickhouse.rs

+        let admin_url = format!("http://{admin_addr}");
+        let log = opctx.log.new(slog::o!("admin_url" => admin_url.clone()));
+        let client = ClickhouseSingleClient::new(&admin_url, log.clone());
+        if let Err(e) = client.init_db().await {


Ok, so I know that I made this change. But afterwards I've been messing with other execution steps and think we should instead return the error here and then convert it into a warning in the execution step. The benefit of doing it this way is that it shows up in the background-task as a warning in OMDB. I'll put a comment next to where this should go.

Done in c6e98a5.

andrewjstone · 2024-11-14T23:11:44Z

nexus/reconfigurator/execution/src/lib.rs

+                    &opctx,
+                    &blueprint.blueprint_zones,
+                )
+                .await?;


We should put the warning here. Instead of using ? we should do the following:

if let Err(e) = clickhouse::deploy_single_node( &opctx, &blueprint.blueprint_zones, ) .await { return StepWarning::new((), e.to_string()).into(); }

plotnick force-pushed the single-node-clickhouse-init branch from cf515bc to e725a76 Compare October 18, 2024 22:08

plotnick changed the base branch from main to clickhouse-admin-smf-package-cleanup October 18, 2024 22:20

plotnick commented Oct 18, 2024

View reviewed changes

nexus/reconfigurator/execution/src/clickhouse.rs Show resolved Hide resolved

plotnick mentioned this pull request Oct 18, 2024

Init timeseries database on connection acquisition #6878

Closed

andrewjstone force-pushed the clickhouse-admin-smf-package-cleanup branch from 63d3ba4 to 79713cb Compare October 21, 2024 15:38

davepacheco assigned plotnick Nov 5, 2024

WIP: Single-node ClickHouse database initialization

2e7724a

plotnick force-pushed the single-node-clickhouse-init branch from e725a76 to 2e7724a Compare November 12, 2024 22:11

plotnick changed the base branch from clickhouse-admin-smf-package-cleanup to main November 12, 2024 22:11

andrewjstone mentioned this pull request Nov 13, 2024

Initialize clickhouse cluster schema via clickhouse-admin #7060

Open

andrewjstone and others added 3 commits November 14, 2024 10:00

skip failing api calls (#7059)

97d4c2c

Merge branch 'main' into single-node-clickhouse-init

f99682e

Merge branch 'main' into single-node-clickhouse-init

258fbcb

plotnick marked this pull request as ready for review November 14, 2024 20:38

plotnick requested review from andrewjstone and jgallagher November 14, 2024 20:38

andrewjstone approved these changes Nov 14, 2024

View reviewed changes

andrewjstone reviewed Nov 14, 2024

View reviewed changes

plotnick added 3 commits November 15, 2024 10:21

Tweak doc comment for ClickhouseAdminSingleApi

2ddcbd9

Make failure to initialize single-node clickhouse a StepWarning

c6e98a5

Don't try to initialize expunged databases

83efab7

plotnick force-pushed the single-node-clickhouse-init branch from fe5e6dc to 83efab7 Compare November 15, 2024 19:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Single node clickhouse init #6903

Single node clickhouse init #6903

plotnick commented Oct 18, 2024 •

edited

Loading

andrewjstone left a comment

andrewjstone Nov 14, 2024

plotnick Nov 15, 2024

andrewjstone Nov 14, 2024

bnaecker Nov 15, 2024

plotnick Nov 15, 2024

andrewjstone Nov 15, 2024

andrewjstone Nov 15, 2024

andrewjstone Nov 14, 2024

plotnick Nov 15, 2024

andrewjstone Nov 14, 2024

Single node clickhouse init #6903

Are you sure you want to change the base?

Single node clickhouse init #6903

Conversation

plotnick commented Oct 18, 2024 • edited Loading

andrewjstone left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

plotnick commented Oct 18, 2024 •

edited

Loading